{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "raw_mimetype": "text/restructuredtext"
   },
   "source": [
    ".. _nb_subset_selection:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Subset Selection Problem\n",
    "\n",
    "A genetic algorithm can be used to approach subset selection problems by defining custom operators. In general, a metaheuristic algorithm might not be the ultimate goal to implement in a real-world scenario; however, it might be useful to investigate patterns or characteristics of possible well-performing subsets. \n",
    "Let us consider a simple toy problem where we have to select numbers from a list. For every solution, exactly ten numbers have to be selected that their sum is minimized.\n",
    "For the subset selection problem, a binary encoding can be used where **one** indicates a number is picked. In our problem formulation, the list of numbers is represented by $L$ and the binary encoded variable by $x$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\\begin{align}\n",
    "\\begin{split}\n",
    "\\min f(x) & = & \\sum_{k=1}^{n} L_k \\cdot x_k\\\\[2mm]\n",
    "\\text{s.t.} \\quad g(x)  & = & (\\sum_{k=1}^{n} x_k - 10)^2\\\\[2mm]\n",
    "\\end{split}\n",
    "\\end{align}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As shown above, the equality constraint is handled by ensuring $g(x)$ can only be zero if exactly ten numbers are chosen.\n",
    "The problem can be implemented as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from pymoo.model.problem import Problem\n",
    "\n",
    "class SubsetProblem(Problem):\n",
    "    def __init__(self,\n",
    "                 L,\n",
    "                 n_max\n",
    "                 ):\n",
    "        super().__init__(n_var=len(L), n_obj=1, n_constr=1, elementwise_evaluation=True)\n",
    "        self.L = L\n",
    "        self.n_max = n_max\n",
    "\n",
    "    def _evaluate(self, x, out, *args, **kwargs):\n",
    "        out[\"F\"] = np.sum(self.L[x])\n",
    "        out[\"G\"] = (self.n_max - np.sum(x)) ** 2\n",
    "    \n",
    "    \n",
    "# create the actual problem to be solved\n",
    "np.random.seed(1)\n",
    "L = np.array([np.random.randint(100) for _ in range(100)])\n",
    "n_max = 10\n",
    "problem = SubsetProblem(L, n_max)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The customization requires writing custom operators in order to solve this problem efficiently. We recommend considering the feasibility directly in the evolutionary operators because otherwise, most of the time, infeasible solutions will be processed.\n",
    "The sampling creates a random solution where the subset constraint will always be satisfied. \n",
    "The mutation randomly removes a number and chooses another one. The crossover takes the values of both parents and then randomly picks either the one from the first or from the second parent until enough numbers are picked."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pymoo.model.crossover import Crossover\n",
    "from pymoo.model.mutation import Mutation\n",
    "from pymoo.model.sampling import Sampling\n",
    "\n",
    "\n",
    "class MySampling(Sampling):\n",
    "\n",
    "    def _do(self, problem, n_samples, **kwargs):\n",
    "        X = np.full((n_samples, problem.n_var), False, dtype=np.bool)\n",
    "\n",
    "        for k in range(n_samples):\n",
    "            I = np.random.permutation(problem.n_var)[:problem.n_max]\n",
    "            X[k, I] = True\n",
    "\n",
    "        return X\n",
    "\n",
    "\n",
    "class BinaryCrossover(Crossover):\n",
    "    def __init__(self):\n",
    "        super().__init__(2, 1)\n",
    "\n",
    "    def _do(self, problem, X, **kwargs):\n",
    "        n_parents, n_matings, n_var = X.shape\n",
    "\n",
    "        _X = np.full((self.n_offsprings, n_matings, problem.n_var), False)\n",
    "\n",
    "        for k in range(n_matings):\n",
    "            p1, p2 = X[0, k], X[1, k]\n",
    "\n",
    "            both_are_true = np.logical_and(p1, p2)\n",
    "            _X[0, k, both_are_true] = True\n",
    "\n",
    "            n_remaining = problem.n_max - np.sum(both_are_true)\n",
    "\n",
    "            I = np.where(np.logical_xor(p1, p2))[0]\n",
    "\n",
    "            S = I[np.random.permutation(len(I))][:n_remaining]\n",
    "            _X[0, k, S] = True\n",
    "\n",
    "        return _X\n",
    "\n",
    "\n",
    "class MyMutation(Mutation):\n",
    "    def _do(self, problem, X, **kwargs):\n",
    "        for i in range(X.shape[0]):\n",
    "            X[i, :] = X[i, :]\n",
    "            is_false = np.where(np.logical_not(X[i, :]))[0]\n",
    "            is_true = np.where(X[i, :])[0]\n",
    "            X[i, np.random.choice(is_false)] = True\n",
    "            X[i, np.random.choice(is_true)] = False\n",
    "\n",
    "        return X"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After having defined the operators a genetic algorithm can be initialized."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "===========================================================================\n",
      "n_gen |  n_eval |   cv (min)   |   cv (avg)   |     fopt     |     favg    \n",
      "===========================================================================\n",
      "    1 |     100 |  0.00000E+00 |  0.00000E+00 |          258 |  4.43940E+02\n",
      "    2 |     200 |  0.00000E+00 |  0.00000E+00 |          205 |  3.54340E+02\n",
      "    3 |     300 |  0.00000E+00 |  0.00000E+00 |          175 |  3.00390E+02\n",
      "    4 |     400 |  0.00000E+00 |  0.00000E+00 |          161 |  2.52240E+02\n",
      "    5 |     500 |  0.00000E+00 |  0.00000E+00 |          124 |  2.18240E+02\n",
      "    6 |     600 |  0.00000E+00 |  0.00000E+00 |          118 |  1.91860E+02\n",
      "    7 |     700 |  0.00000E+00 |  0.00000E+00 |          118 |  1.70050E+02\n",
      "    8 |     800 |  0.00000E+00 |  0.00000E+00 |           85 |  1.49630E+02\n",
      "    9 |     900 |  0.00000E+00 |  0.00000E+00 |           85 |  1.39550E+02\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   10 |    1000 |  0.00000E+00 |  0.00000E+00 |           85 |  1.30580E+02\n",
      "   11 |    1100 |  0.00000E+00 |  0.00000E+00 |           74 |  1.19930E+02\n",
      "   12 |    1200 |  0.00000E+00 |  0.00000E+00 |           74 |  1.11800E+02\n",
      "   13 |    1300 |  0.00000E+00 |  0.00000E+00 |           74 |  1.06050E+02\n",
      "   14 |    1400 |  0.00000E+00 |  0.00000E+00 |           68 |  1.00100E+02\n",
      "   15 |    1500 |  0.00000E+00 |  0.00000E+00 |           68 |  9.46300E+01\n",
      "   16 |    1600 |  0.00000E+00 |  0.00000E+00 |           52 |  8.97600E+01\n",
      "   17 |    1700 |  0.00000E+00 |  0.00000E+00 |           50 |  8.45100E+01\n",
      "   18 |    1800 |  0.00000E+00 |  0.00000E+00 |           50 |  8.09000E+01\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   19 |    1900 |  0.00000E+00 |  0.00000E+00 |           50 |  7.66900E+01\n",
      "   20 |    2000 |  0.00000E+00 |  0.00000E+00 |           50 |  7.33700E+01\n",
      "   21 |    2100 |  0.00000E+00 |  0.00000E+00 |           45 |  7.00100E+01\n",
      "   22 |    2200 |  0.00000E+00 |  0.00000E+00 |           45 |  6.78900E+01\n",
      "   23 |    2300 |  0.00000E+00 |  0.00000E+00 |           45 |  6.60300E+01\n",
      "   24 |    2400 |  0.00000E+00 |  0.00000E+00 |           45 |  6.43500E+01\n",
      "   25 |    2500 |  0.00000E+00 |  0.00000E+00 |           45 |  6.36500E+01\n",
      "   26 |    2600 |  0.00000E+00 |  0.00000E+00 |           45 |  6.16100E+01\n",
      "   27 |    2700 |  0.00000E+00 |  0.00000E+00 |           45 |  6.03300E+01\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   28 |    2800 |  0.00000E+00 |  0.00000E+00 |           43 |  5.94900E+01\n",
      "   29 |    2900 |  0.00000E+00 |  0.00000E+00 |           43 |  5.76600E+01\n",
      "   30 |    3000 |  0.00000E+00 |  0.00000E+00 |           43 |  5.65100E+01\n",
      "   31 |    3100 |  0.00000E+00 |  0.00000E+00 |           42 |  5.54700E+01\n",
      "   32 |    3200 |  0.00000E+00 |  0.00000E+00 |           42 |  5.42800E+01\n",
      "   33 |    3300 |  0.00000E+00 |  0.00000E+00 |           42 |  5.36700E+01\n",
      "   34 |    3400 |  0.00000E+00 |  0.00000E+00 |           42 |  5.27000E+01\n",
      "   35 |    3500 |  0.00000E+00 |  0.00000E+00 |           42 |  5.21800E+01\n",
      "   36 |    3600 |  0.00000E+00 |  0.00000E+00 |           42 |  5.19700E+01\n",
      "   37 |    3700 |  0.00000E+00 |  0.00000E+00 |           39 |  5.14400E+01\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   38 |    3800 |  0.00000E+00 |  0.00000E+00 |           39 |  5.10500E+01\n",
      "   39 |    3900 |  0.00000E+00 |  0.00000E+00 |           39 |  5.08500E+01\n",
      "   40 |    4000 |  0.00000E+00 |  0.00000E+00 |           39 |  5.05500E+01\n",
      "   41 |    4100 |  0.00000E+00 |  0.00000E+00 |           39 |  5.01200E+01\n",
      "   42 |    4200 |  0.00000E+00 |  0.00000E+00 |           39 |  4.93800E+01\n",
      "   43 |    4300 |  0.00000E+00 |  0.00000E+00 |           39 |  4.88000E+01\n",
      "   44 |    4400 |  0.00000E+00 |  0.00000E+00 |           39 |  4.85700E+01\n",
      "   45 |    4500 |  0.00000E+00 |  0.00000E+00 |           39 |  4.81400E+01\n",
      "   46 |    4600 |  0.00000E+00 |  0.00000E+00 |           38 |  4.76700E+01\n",
      "   47 |    4700 |  0.00000E+00 |  0.00000E+00 |           38 |  4.71800E+01\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   48 |    4800 |  0.00000E+00 |  0.00000E+00 |           37 |  4.66100E+01\n",
      "   49 |    4900 |  0.00000E+00 |  0.00000E+00 |           37 |  4.62700E+01\n",
      "   50 |    5000 |  0.00000E+00 |  0.00000E+00 |           37 |  4.62000E+01\n",
      "   51 |    5100 |  0.00000E+00 |  0.00000E+00 |           37 |  4.57600E+01\n",
      "   52 |    5200 |  0.00000E+00 |  0.00000E+00 |           37 |  4.56000E+01\n",
      "   53 |    5300 |  0.00000E+00 |  0.00000E+00 |           37 |  4.53400E+01\n",
      "   54 |    5400 |  0.00000E+00 |  0.00000E+00 |           36 |  4.49200E+01\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   55 |    5500 |  0.00000E+00 |  0.00000E+00 |           36 |  4.46700E+01\n",
      "   56 |    5600 |  0.00000E+00 |  0.00000E+00 |           36 |  4.45100E+01\n",
      "   57 |    5700 |  0.00000E+00 |  0.00000E+00 |           36 |  4.43800E+01\n",
      "   58 |    5800 |  0.00000E+00 |  0.00000E+00 |           36 |  4.42300E+01\n",
      "   59 |    5900 |  0.00000E+00 |  0.00000E+00 |           36 |  4.41400E+01\n",
      "   60 |    6000 |  0.00000E+00 |  0.00000E+00 |           36 |  4.38100E+01\n",
      "Function value: 36\n",
      "Subset: [ 5  9 12 31 36 37 47 52 68 99]\n"
     ]
    }
   ],
   "source": [
    "from pymoo.algorithms.so_genetic_algorithm import GA\n",
    "from pymoo.optimize import minimize\n",
    "\n",
    "algorithm = GA(\n",
    "    pop_size=100,\n",
    "    sampling=MySampling(),\n",
    "    crossover=BinaryCrossover(),\n",
    "    mutation=MyMutation(),\n",
    "    eliminate_duplicates=True)\n",
    "\n",
    "res = minimize(problem,\n",
    "               algorithm,\n",
    "               ('n_gen', 60),\n",
    "               seed=1,\n",
    "               verbose=True)\n",
    "\n",
    "print(\"Function value: %s\" % res.F[0])\n",
    "print(\"Subset:\", np.where(res.X)[0])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we can compare the found subset with the optimum known simply through sorting:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimal Subset: [ 5  9 12 31 36 37 47 52 68 99]\n",
      "Optimal Function Value: 36\n"
     ]
    }
   ],
   "source": [
    "opt = np.sort(np.argsort(L)[:n_max])\n",
    "print(\"Optimal Subset:\", opt)\n",
    "print(\"Optimal Function Value: %s\" % L[opt].sum())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
